An efficient labeling tool for the Quicksig speech database

نویسندگان

  • Matti Karjalainen
  • Toomas Altosaar
  • Miikka Huttunen
چکیده

An automated speech signal labeling tool, developed for the QuickSig speech database environment, is described. It is based primarily on the use of neural networks as diphone event detectors. For robustness, only coarse categories of diphones, such as stop–vowel and vowel–nasal, are used. 64 such detectors are implemented to cover all of the Finnish diphones. The preprocessing of speech signals is carried out using warped linear prediction and the diphone events from neural network outputs are matched to the given text transcription using a simple rule-based parser. In the case of isolated word labeling of single speaker signals a well trained system makes about 1-2 % of coarse labeling errors and the deviation of boundary positions, compared to careful manual labeling, is on average about 10 ms. Generalization ability to label other speakers shows promising.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finnish and Estonian Speech Applications Developed on an Object-Oriented Speech Processing and Database System

QuickSig, an object oriented signal processing system that represents a modern tool with which to perform DSP related studies, is presented. It empowers speech scientists to operate in a flexible and motivating environment where signals, filters, spectrograms, etc., are all modelled as objects. Seamlessly integrated to QuickSig is an object-oriented database that permits signals along with thei...

متن کامل

Object-oriented Access to the Estonian Phonetic Database

The paper introduces the Estonian Phonetic Database developed at the Laboratory of Phonetics and Speech Technology of the Institute of Cybernetics at the Tallinn Technical University, and its integration into QuickSig – an object-oriented speech processing environment developed at the Acoustics Laboratory of the Helsinki University of Technology. Methods of database access are discussed, relati...

متن کامل

Design and Implementation of an Intelligent Part of Speech Generator

The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Introducing a web application for labeling, visualizing speech and correcting derived speech signals

The advent of HTML5 has sparked a great increase in interest in the web as a development platform for a variety of different research applications. Due to its ability to easily deploy software to remote clients and the recent development of standardized browser APIs, we argue that the browser has become a good platform to develop a speech labeling tool for. This paper introduces a preliminary v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998